Multi-environment Speaker Verification
نویسندگان
چکیده
Here we investigate an instance of the abstract problem of pattern recognition under mismatch conditions: Models of phenomena are built with data collected in the training environment but must be used to recognize the same phenomena in another environment. The speciic problem is speaker veriication, where the training and testing data for each speaker can come from one of many diierent microphones. We use data, unlabeled with respect to channel or environment, to build, unsupervised, an easily extensi-ble, hierarchical structure that at the nest level consists of individual speaker models, but at the coarsest level is a collection of all of the models. We then have the ability to automatically generate evolving background models from any layer of our hierarchical model when we wish to perform a veriication. We give results to show that the richer our hierarchical structure, the better we do in terms of veriication. 1. INTRODUCTION We consider the problem of speaker veriication under mis-matched conditions when the number of environments in which training and testing data can be collected is large and variable. We describe our technique for dealing with multiple environments, making a special note of the fact that it is unsupervised and incremental, and so can potentially be used in situations where we do not know beforehand, the nature of the environments in which we will be collecting data. For example, take situations in which the only way we know about new environments is through enrollment data. In the multi-environment veriication context, the technique is best described as one allowing easy modiication of the normalizing background to reeect data from perhaps new and unknown environments, as those environments are seen in enrollment. Speciically, we analyze performance on data collected form 8 diierent microphones in a relatively noisy environment, a cafeteria. We show that we are able to obtain an improvement in multi-environment speaker recognition performance by adding information about multiple environments solely through our enrollment process, which is eecient. Traditional approaches to such normalization in speaker veriication have involved the supervised use of data from
منابع مشابه
Using Exciting and Spectral Envelope Information and Matrix Quantization for Improvement of the Speaker Verification Systems
Speaker verification from talking a few words of sentences has many applications. Many methods as DTW, HMM, VQ and MQ can be used for speaker verification. We applied MQ for its precise, reliable and robust performance with computational simplicity. We also used pitch frequency and log gain contour for further improvement of the system performance.
متن کاملUsing Exciting and Spectral Envelope Information and Matrix Quantization for Improvement of the Speaker Verification Systems
Speaker verification from talking a few words of sentences has many applications. Many methods as DTW, HMM, VQ and MQ can be used for speaker verification. We applied MQ for its precise, reliable and robust performance with computational simplicity. We also used pitch frequency and log gain contour for further improvement of the system performance.
متن کاملRobust speaker verification using short-time frequency with long-time window and fusion of multi-resolutions
This study presents a novel approach of feature analysis to speaker verification. There are two main contributions in this paper. First, the feature analysis of short-time frequency with long-time window (SFLW) is a compact feature for the efficiency of speaker verification. The purpose of SFLW is to take account of short-time frequency characteristics and longtime resolution at the same time. ...
متن کاملMulti-task learning for text-dependent speaker verification
Text-dependent speaker verification uses short utterances and verifies both speaker identity and text contents. Due to this nature, traditional state-of-the-art speaker verification approaches, such as i-vector, may not work well. Recently, there has been interest of applying deep learning to speaker verification, however in previous works, standalone deep learning systems have not achieved sta...
متن کاملStream-weight optimization by LDA and adaboost for multi-stream speaker verification
This paper proposes an automatic stream-weight optimization method for noise-robust speaker verification using multi-stream HMMs integrating spectral and prosodic information. The paper first shows the effectiveness of the multi-stream technique in our speaker verification framework. Next, a stream-weight adaptation method combining the linear discriminant analysis (LDA) and Adaboost techniques...
متن کاملSpeaker verification over cellular networks
This paper demonstrates the performance gap between speaker verification over land-line telephone networks and speaker verification over cellular networks. The paper shows that the cellular coding accounts for only a fraction of the observed performance gap. A dual-channel corpus, with speakers recorded simultaneously in a land-line phone and a cellular phone, is used to study the effect of the...
متن کامل